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On MMSE Estimation from Quantized 
Observations in the Nonasymptotic Regime 
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Abstract —This paper studies MMSE estimation on the basis of 
quantized noisy observations. It presents nonasymptotic bounds 
on MMSE regret due to quantization for two settings: (1) 
estimation of a scalar random variable given a quantized vector 
of n conditionally independent observations, and (2) estimation 
of a p-dimensional random vector given a quantized vector of n 
observations (not necessarily independent) when the full MMSE 
estimator has a subgaussian concentration property. 

Index Terms —MMSE estimation, vector quantization, indirect 
rate distortion problems, nonasymptotic bounds. 

I. Introduction 

Minimum mean-square error (MMSE) estimation is a fun¬ 
damental primitive in communications, signal processing, and 
data analytics. In today’s applications, the estimation task is 
often performed on high-dimensional data collected, and pos¬ 
sibly preprocessed, at multiple remote locations. Consequently, 
attention must be paid to communication constraints and their 
impact on the MMSE. 

One strategy for reducing the communication burden is to 
compress the observations using vector quantization (VQ). 
The idea of quantization for estimation and control can be 
traced back to the work of Curry [1], who considered in 
detail the jointly Gaussian case, derived a modification of the 
Kalman filter for use with quantized inputs, and developed 
approximation-based schemes for nonlinear systems. Because 
VQ inevitably introduces loss, it is of interest to characterize 
the resulting MMSE regret, i.e., the difference between the 
optimal performance achievable with quantized observations 
and the optimal performance that can be attained without 
quantization. The problem of designing an optimal vector 
quantizer to minimize the MMSE regret is equivalent to a 
noisy source coding problem with the quadratic fidelity crite¬ 
rion, where the compressor acts on the vector of observations 
and the decompressor generates an estimate of the target 
random vector. This equivalence was systematically studied 
by Wolf and Ziv [2], who showed that there is no loss of 
optimality if we first compute the full MMSE estimate and 
then compress it using an optimal quantizer. Ephraim and 
Gray [3] later extended this result to a more general class of 
weighted quadratic distortion functions and gave conditions for 
convergence of the Lloyd-type iterative algorithm for quantizer 
design. 
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The main message of the above works is that the problem 
of quantizer design for minimum MMSE regret is afunctional 
compression problem: the compressed representation of X 
should retain as much information as possible about the 
conditional mean ri{X) = ]E[y|2f] of the target Y given X, 
also known as the regression function. However, there is an 
interesting tension between the information-theoretic and the 
statistical aspects of the problem: if the MMSE without quan¬ 
tization is sufficiently small, then the dominant contribution 
to the quantized MMSE should come from the quantization 
error; in the opposite regime of high-rate quantization, the 
quantized MMSE should not differ much from its unquantized 
counterpart. 

In this paper, we study both the statistical and the functional- 
compression aspects of MMSE estimation with quantized 
observation. In particular, we obtain sharp upper bounds on 
the MMSE regret in two scenarios: 

« MMSE estimation of a scalar random variable Y, where 
the vector X = (Xi,..., X„) of conditionally i.i.d. 
observations is passed through a fc-ary vector quantizer. 
In this setting, under mild regularity conditions on the 
conditional distribution Pxi\y^ we obtain nonasymptotic 
bounds on the MMSE regret. These bounds exhibit two 
distinct behaviors depending on whether the number of 
observations n is larger than the square of the codebook 
size In the asymptotic regime of n,k — oo, we 
recover an existing result of Samarov and Has’minskii 
[6], who were the first to address this problem. 

« MMSE estimation of a p-dimensional random vector Y 
when the n-dimensional vector of observations X (not 
necessarily independent) is passed through a fc-ary VQ. 
In this setting, we derive an upper bound on the MMSE 
regret under the assumption that the £2 norm of the 
regression function exhibits subgaussian concentration 
around its expected value E|l77(X)||. Unlike some of 
the existing literature on high-resolution quantization for 
functional compression (e.g., [4]), we do not require 
smoothness of ri(X). 

Notation. We will always denote by || • || the £2 (Euclidean) 
norm. A k-ary quantizer on the Euclidean space is a Borel- 
measurable mapping q : —>■ [fc], where [A:] is shorthand for 
the set {1,..., k}. Any such q is characterized by its cells or 
bins, i.e., the Borel sets Cj = (Z~^({j}) = {u G : q(v) = 
j}, j G [fc]. The set of all fc-ary quantizers on will be 
denoted by Qf. A k-ary reconstruction function on is a 
mapping / : [fc] —>■ R^. Any such / is characterized by its 


reconstruction points Cj = f{j) € j G [k]. The set of 
all k-ary reconstruction functions on will be denoted by 
TZ'l. Any bnite set of points C = {ci,..., c^} C debnes a 
quantizer qc G Q‘1 and a reconstruction function fc G TZf by 

'Zc('f) = argmin ||u — Cjll, Vu £ 

ie[k] 

and fc{j) = Cj for all j G [fc]. The composite mapping 
qc = fc o qc : —?► R'^ is called a k-point nearest-neighbor 
quantizer with codebook C. A well-known result is that, for 
any random vector V with E||y|p < oo, 

inf ini E\\V - fiqiV))r = inf E||F-qc(F)f, 
faniqdQi CcR-irlChfe 

and the inhmum is actually a minimum. We refer the reader 
to the survey article by Gray and Neuhoff [5] for more details. 

We use the following asymptotic order notation: for two se¬ 
quences {ttm} and {bm}, we write am ^ bm if am = 0{bm), 
and Qm ^ bm if am bm and bm zf am- 

II. Performance criteria and some basic results 

Suppose two random vectors X in R" and Y in R^ are 
jointly distributed according to a given probability law Pxy- 
The MMSE in estimating E as a function of X is 

mmse(Pxu) = inf E||f^ - 

where the inhmum over all Borel-measurable functions / : 
R" —>■ RP is achieved by the regression function ri{x) = 
E[y|X = x\. We consider the problem of MMSE estimation 
of Y under the circumstances where only q{X), a quantized 
version of X, is accessible. Thus, for each fc G Z+, we are 
interested in the MMSE functional 

mmsefc(Pxv) = inf inf Ejjy -/(( 7 (A:))f. 

/eKj geQ^ 

This problem can be cast as one-shot hxed-rate lossy coding 
of Y with squared-error distortion when the encoder only has 
access to X (see, e.g., [2], [3]). We recall some known results 
that explicitly involve the regression function r]{X). The hrst 
one, due to Wolf and Ziv [2], is a useful decomposition of 
mmsefc(Pxy): 

Proposition 1. For every k G 

mmsefc(Pxv) = mmse(Pxy) + regfc(Pxv), (1) 

where 

reg,(Pxy) = inf E ||E[F|X] - E[y|g(X)]f 
q^Ql 

is the MMSE regret due to quantization. 

Remark 1. A special case of this result for jointly Gaussian 
X and Y was obtained by Curry [1, Sec. 2.4]. 

The second result, which was proved by Ephraim and Gray 
[3] for a more general class of weighted quadratic distortion 
functions, shows that there is no loss of optimality if we restrict 
our attention to schemes of the following type: given X, we 
hrst compute the regression function r]{X), quantize it using a 


fc-ary quantizer, and then estimate Y by its conditional mean 
given the cell index of X. 

Proposition 2. 

regfc(PxY) = inf inf E \\r]{X) - f{q{r]{X)))f (2a) 

f&K q(iQl 

= inf EUX) - qc{v{X))f ■ (2b) 

CCRP: |C|=fe 

III. The case of n conditionally independent 
observations 

We now consider the situation when the coordinates 
Xi,. ■ ■, Xn of X are independent and identically distributed 
(i.i.d.) conditionally on Y. To keep things simple, we consider 
the case when E is a scalar random variable (i.e., p = 1), and 
its marginal distribution Py has a probability density function 
fy supported on a compact interval y = [—A, A]. In the 
asymptotic regime as fc —>■ oo and n —>■ oo, this setting was 
investigated by Samarov and Has’minskii [6], who showed the 
following under mild regularity conditions: 

• If n/fc^ —oo, then the dominant contribution to 
mmsefc(PxY) comes from the minimum expected dis¬ 
tortion 

incurred on Y by any fc-point quantizer [7], [8]. 

• If n/fc^ —0, then the dominant contribution to 
mmsefcjPxv) comes from 

mmse(PxY) - - / -yp^dy, 
njy I{y) 

where I{y) is the Eisher information in the conditional 
distribution Pxi|Y=y [9] (see below for dehnitions). 

The proofs in [6] rely on the deep result, commonly referred to 
as the Bernstein-von Mises theorem, which says that the pos¬ 
terior distribution of the normalized error ^/nl(Y){r]{X) — Y) 
is asymptotically standard normal (see, e.g., [10, Sec. 10.2]). 

In this section, we establish a hnite n, bnite fc version of the 
results of Samarov-Has’minskii using a recent nonasymptotic 
generalization of the Bernstein-von Mises theorem due to 
Spokoiny [11]. Our result requires a number of regularity 
assumptions. The hrst two are standard: 

(C.l) The conditional distributions Pxi|Y=y^ y G y, are 
dominated by a common cr-bnite measure y on the real 
line. The corresponding log-density 

iiu, y) = log- ^ -(u) 

is twice differentiable in y for every u. We will denote 
the brst and second derivatives with respect to y by 9 
and 9^. 

(C.2) Eor each y G y, the Eisher information [12] 
I{y)^-d^E[i{XyY)\Y = y] 
exists and is positive. 





The remaining assumptions (see [13, Sec. 5.1]) are slightly 
stronger than the classical ones. For each r > 0, define the 
local neighborhoods 

K{y) = jy' e 3^ : '/^W - i/l < ?'} • 

The regularity assumptions can be split into two groups: 
identifiability conditions and exponential moment conditions. 
We start with the former: 

(I.l) There are positive constants ro > 0 and <5* > 0, such 
that, for every r < ro, 


sup sup 
yey y'eAfr(y) 


‘2DiPxi\Y=y\\Pxi\Y=y') 

Hy){y-y')^ 


(1.2) There is a constant 5 > 0, such that, for every r > 0, 


inf inf 

y&y y'edj\fr{y) 


D{Pxi\Y=y\\Pxi\Y=y'x) 

Hy)iy-y')‘^ 


> b. 


The exponential moment assumptions pertain to the random 
variables 


C,(Xi, 2 /') 4 £(Xi,j/') - E[£{X,,y')\Y = y], y,y'&y 


[note that Cy{Zi,y) = l{Xi,y)]. Let 


^v{Z-i,y') = ^C,y{Xi,w) 


w=y' 


(E.l) There exist constants > 0 and vq > 0, such that 


sup log E 
yey 


exp 


My(^l,y) 


V = y 




< 


for all |A| < gi- 

(E.2) There exists a constant oj* > 0, such that, for every 
r < ro. 


sup sup log E 
yey y'eN'riy) 

< 


exp 




Lj^r 


VW) 


Y = y 


for all |A| < gi- 

(E.3) Eor every r > 0 there exists gi{r) > 0, such that 


sup sup log E 
yey y'&ATriv) 


exp 


\/%j) j 


Y = y 


2 \2 


< 


viX 


for all |A| < gi{r). 

Eor example, in the case of additive Gaussian noise, i.e., when 
Pxi\Y=y = Al{y,a'^) for all y € 3^, it is easy to verify that 
all of these assumptions are met. 

We are now ready to state and prove the main result of 
this section. Since both Y and g{X) are supported on the 
bounded interval \—A, A], there is no loss of generality in 
restricting our attention only to nearest-neighbor quantizers 
qc with codebooks C = {t/i,... ,yk} C [—A, A]. Moreover, 
we can assume that C is ordered in such a way that —A = 


yo < yi < y 2 < ■ ■ ■ < yk < yk+i = A. Given such an 
ordered C, we define 


Ac = max (y^+i - yj) 

0<j<k 


Theorem 1. Suppose that Assumptions (C.1)-(C.2), (I.l)- 
(1.2), and (E.1)-(E.3) hold. Suppose also that log fy is Lip- 
schitz on [—Al,A]. Then there exists a constant L > 0 that 
depends only on the constants in the above assumptions, 
such that, for any k-point nearest-neighbor quantizer qc with 
C ay, we have 


E\y{X)-qc{v{X))\^-E\Y-qc{Y)f 


< LAc min 


1 , 



1 

7w) 


a/ mmse(PxY) 


)1 

(3) 


In the additive Gaussian noise case, the bound (3) becomes 
E|p(A)-qc(y(X))|^-E|y-qc(F)|" 

where is the noise variance. 

Proof: The idea of the proof of our nonasymptotic 
result is actually rather simple, unlike that of Samarov and 
Has’minskii [6], which requires a number of delicate asymp¬ 
totic approximations and several fairly tedious integrations. 
Eor any collection C = {yi,...,yk} of k reconstruction 
points, define the function ec : 3^ —R'*’ by 

ec(y) = min(y-yj)2. 

3^[k] 

Then a simple calculation shows that 

|ec(y) - ec(y')l < min {2Ac, 2Ac|y - y'|} , (4) 

for all y, y' G y. The expected reconstruction error of the 
nearest-neighbor quantizer qc can be written as 

E|r;(A)-qc(p(X))|'=E[ec(p(X))]. 


Using the law of iterated expectation and the smoothness 
estimate (4), we obtain 


\E[ec{y{X))]-E[ec{Y)]\ 

<E\E[ec{ii{X)) - ec{Y)\Y]\ 

<2A2Emin|l,^E[|p(A)-r||y]|. (5) 

We now invoke Spokoiny’s nonasymptotic Bernstein-von 
Mises theorem [11]. Let = g{X) — {Y -|-G„(A, E)), where 

n 

en{x,Y) ^—Ydm.Y), 

^ ' i—1 

and for any L > 0 consider the event 

A^{Y) A |v47(F)|E„| < L ' I , 








































Then there exists a choice L = Lq that depends only on the 
constants in the regularity conditions, such that 

V[An{Y)\Y]>l--, (6) 

where An(Y) = A^°(Y), and C > 0 is an absolute constant 
[11, Sec. 2.4.3]. Therefore, 


E[MX)-YllY] 

< E[|z„||r] +E[|G„(x,F)||r] 

= E[l{An{Y)}\Zr,\\Y] +E[lM^(y)}|Z„||r] 

+ E[|G„(x,y)||r] 

< + J—E[\Y -7j{X)\-^\Y] 

~ \AT(F) M n ti I j 

l2C 

+ ^ —E[\GniX,Y)\^\Y] +E[\G4X,Y)\\Y], (7) 

where (a) follows from the triangle inequality, (b) from ( 6 ) and 
Cauchy-Schwarz, and (c) again from the triangle inequality. 
Now, since E[di{Xi,Y)\Y] = 0 and X‘AY[^^{Xl,Y)\Y] = 
I{Y) [12], we have 

E[|G„(x,y)|2|y] = var[G„(x,y)|y] = 

and 

E[\Gn{X,Y)\\Y]<^^ 

Substituting these estimates into (7) gives 


E[|p(x)-r||y] < 




Lo + l 


nvG(F) 


2GE[|y-77(x)|2|y] 


Plugging this bound into (5), using Jensen’s inequality, and 
simplifying, we get (3). ■ 


Corollary 1. Under the same assumptions as in Theorem 1, 




^ min ■ 


1 1 


, ( 

k'^ ’ k^/n y 


E 


^/W) 


+ s/ mmse{PxY) 


( 8 ) 


Remark 2. Using the information inequality [9] 


mmseiPjcr) > —E 

n 


1 


nY)\ 


and Jensen’s inequality, the bound ( 8 ) can be weakened to 




^ min ■ 


1 A/mmse(Pxy) 
F’ k 


Proof: Let C* = {y*,..., be the reconstruction points 
of an optimal /c-point quantizer for Y arranged in increasing 
order. From the work of Panter and Dite [7], we know that 
these points should be chosen in such a way that 

fYiy)^^^dy = ^ J fYiv^^^dy, 

Vj y 

for all j = 0,1,..., fc — 1, where Cj = 1/2 for / = 0 and 1 
otherwise. Therefore, Ac* di 1/k. Using this and the definition 
of the MMSE regret in (3), we get (8). ■ 

Observe that the value of the right-hand side of (8) is 
determined by whether the number of observations n is larger 
or smaller than This agrees with the asymptotic results of 
Samarov and Has’minskii [6]. 


IV. A HIGH-RESOLUTION BOUND FOR FUNCTIONAL 
COMPRESSION 


We now consider a general setting of p-dimensional Y 
and n-dimensional X without any conditional independence 
assumptions. Instead, we focus on the scaling of the MMSE 
regret reg^(Pxu) with k, while the values of n and p stay 
fixed. Proposition 1 shows that the MMSE regret is precisely 
the minimum expected distortion attainable by fc-ary quan¬ 
tization of X in the problem of functional compression of 
the regression function ri{X). The results we present in this 
section require only some regularity assumptions on the £2 
norm || 77 (A:)||: 

Assumption 1. The random variable || 77(A) || has a finite 
fourth moment: E|| 77 (A)||^ < 00. 


Assumption 2. The random variable || 77 (A)|| is subgaussian: 
there exists a positive constant v > 0, such that 


logE 


gA(||r,(X)||-E||r,(X)||) 



VA e R. 


One sufficient condition for Assumption 1 to hold is for ||y || 
to have a finite fourth moment. Indeed, in that case, using 
Jensen’s inequality, we have E|| 77 (A)||''’ = E||E[y|A]||'^ < 
E||y||'^ < 00 . As for Assumption 2, it will be met, for 
example, if the regression function 77 is Lipschitz, i.e., if there 
exists some finite constant L > 0, such that 


||77(x) — 77 (x')|| < L\\x — Vx, x' G R" 


and if A is a Gaussian random vector with a nonsingular 
covariance matrix [14]. 

Theorem 2. Suppose Assumption 1 holds. Then 

regfc(Pxy) A (E||77(A)f E||77(A)||4)"/' (9) 

If Assumption 2 also holds, then 

regfc(Pxu) 

A inf (r 2 fc- 2 /P + v'E|| 77 (A)|| 4 e-(’-'^ll''WI)'/ 4 ’’). 

r>E|h(X)|| I V II V 711 j 

( 10 ) 






































Remark 3. Here, p can be replaced by a suitable intrinsic 
dimension of the support of ri{X) (e.g., the rate-distortion di¬ 
mension [15]). For example, if ri(X) is linear, i.e., E[y|X] = 
AX for some deterministic matrix A G then we can 

replace p by rank(^). Moreover, using a suboptimal value of 
r, we can weaken the bound in (10) to 


regfc(Pxy) ^ 


logfc 
k'^/p ’ 


Therefore, 

reg,+i(PxF) < EMX) - E[p(X)|gW]f 

^^2^-2/p ^ VEUx)\\muxw ^ 

r 

Optimizing over all r > 0, we get (9). Now suppose Assump¬ 
tion 2 also holds. Let r = E||77(X)|| +t for some f > 0. Since 
||77(2f )|| is subgaussian, the Chernoff bounding technique gives 


where the hidden constant depends on p, on the first and fourth 
moments of ||77(2f)||, and on the subgaussian constant v. Apart 
from the logarithmic factor, this scaling of the MMSE regret 
agrees with the high-resolution approximation for VQ [5] and 
with the Shannon lower bound [15], [16]. 

Proof: From Assumption 1 and from Jensen’s inequality, 
it follows that the first and second moments of ||? 7 ( 2 f)|| are 
also finite. 

Fix a positive real constant r > 0, which will be optimized 
later. A simple volumetric estimate shows that the ball of 
radius r in can be covered by at most (l -|- balls of 

radius e. So, for a given k G Z+, we can cover the radius-r ball 
by k balls of radius e x rk~^^P. Let be the centers 

of these k balls. We now construct a quantizer G Q^-i-i 
as follows: 

rargmin||p(a:) - if ||p(a:)|| < r 

q^’'\x) = ^ jm 

Ifc-fl, otherwise. 

For this quantizer, we have 

EUX) - E[77(X)|gW(X)]f = Ti + 

where 

Ti 4 E [l{||77(X)|| < r}MX) - E[p(X)|gM(X)]f ■ 

r 2 ^E [l{||, 7 (X)|| >r}||, 7 (X)-E[p(W)|gM(X)]f' . 

For the hrst term, we have 

Ti < 4e¥ [II77(26)II < r] < 4e^ x 

where the first inequality is true since ri{X) and 
E[r,{X)\q(^HX)] are inside the same e-ball in the covering 
for all X such that q^'^\X) G [k]. For the second term, 

T2< [1177(26)11 >r]E||77(X)-E[77(X)|gW(X)||4 

< VE[MX)\\ > r]Y/E(||r7(26)|| + ||E[77(26)|gW(26)]||)4 

< VE[MX)\\ > r] y/ 8E||77(X)||4 + 8E||E[77(26)|gM]||4 

< y/P[||77(26)|| >r]v/l6E||77(26)||4, 

where the first line is by Cauchy-Schwarz inequality, and the 
remaining steps follow from monotonicity and convexity. 

By Markov’s inequality, 

P[||77(26)|| > r] = P[||77(26)f > 


P [1177(26)11 >r]=P[||77(26)||-E||77(26)||>f] 

= g-(r-E||r,(X)||)V2« 

(see, e.g., [14, Chap. 3]). Thus, we have 

regfc+i(PxF) < E||r7(26) - E[77(26)|gW]f 

X + v'E||77(26)||4e-(’-'^ll’'('^)ll)"/4L 

for all r > E||77(26)||. Optimizing over r, we get (10). ■ 
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